Intelligent Assistance for the Data Mining Process: An Ontology-based Approach
نویسندگان
چکیده
A data mining (DM) process involves multiple stages. A simple, but typical, process might include preprocessing data, applying a data-mining algorithm, and postprocessing the mining results. There are many possible choices for each stage, and only some combinations are valid. Because of the large space and non-trivial interactions, both novices and data-mining specialists need assistance in composing and selecting DM processes. We present the concept of Intelligent Discovery Assistants (IDAs), which provide users with (i) systematic enumerations of valid DM processes, in order that important, potentially fruitful options are not overlooked, and (ii) effective rankings of these valid processes by different criteria, to facilitate the choice of DM processes to execute. We use a prototype to show that an IDA can indeed provide useful enumerations and effective rankings. We discuss how an IDA is an important tool for knowledge sharing among a team of data miners. Finally, we illustrate all the claims with a comprehensive demonstration using a more involved process and data from the 1998 KDDCUP competition.
منابع مشابه
Ontology-guided intelligent data mining assistance: Combining declarative and procedural knowledge
The effective application of a data mining process is littered with many difficult and technical decisions (i.e. data cleansing, feature transformations, algorithms, parameters, evaluation). Subsequently, most data mining products provide a large number of models and tools, but few provide intelligent assistance for addressing the above-mentioned challenges that face the non-specialist data min...
متن کاملToward intelligent data warehouse mining: An ontology-integrated approach for multi-dimensional association mining
A data warehouse is an important decision support system with cleaned and integrated data for knowledge discovery and data mining systems. In reality, the data warehouse mining system has provided many applicable solutions in industries, yet there are still many problems causing users extra problems in discovering knowledge or even failing to obtain the real and useful knowledge they need. To i...
متن کاملA New Ontology-Based Approach for Human Activity Recognition from GPS Data
Mobile technologies have deployed a variety of Internet–based services via location based services. The adoption of these services by users has led to mammoth amounts of trajectory data. To use these services effectively, analysis of these kinds of data across different application domains is required in order to identify the activities that users might need to do in different places. Researche...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملPrioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002